Goto

Collaborating Authors

 random forest classifier


SM.1 Omittedproofs SM.1.1 ProofofProposition1 Proposition1. ThefunctionmC() = 2C(Mϵ()): X [1,c]satisfiesallpropertiesofapredictive multiplicitymetricinDefinition1

Neural Information Processing Systems

For clarity, we assume|Mϵ(xi)| = m. By the information inequality [1, Theorem 2.6.3] the mutual informationI(M;Y) between the random variablesM and Y (defined in Section 3) is non-negative, i.e.,I(M;Y) 0. On the other hand, we denote the c models in R(H,ϵ) which output scores are the "vertices" of c to be m1,,mc, then H(Y|M = mk) = 0, k [c]. H(Y|M) is minimized to 0 by setting the weightspm on those c models to be 1c and the rest to be0. Since this holds for the capacity-achievingPM, which in turn is the maximimum across input distributions,theconverseresultfollows. Theconsequence ofpredictivemultiplicity isthatthe sameindividual can betreated differently due toarbitrary and unjustified choices made during the training process (e.g., parameter initialization, random seed, dropoutprobability,etc.).




A Multilayered Approach to Classifying Customer Responsiveness and Credit Risk

Afolabi, Ayomide, Ogburu, Ebere, Kimitei, Symon

arXiv.org Machine Learning

AB S TRACT This study evaluates the performance of various classifiers in three distinct models: r esponse, r isk, and r esponse - r isk, concerning credit card mail campaigns and default prediction. In the r esponse model, the Extra Trees classifier demonstrates the highest recall level (79.1%), emphasizing its effectiveness in identifying potential responders to targeted credit card offers. Conversely, in the r isk model, the Random Forest classifier exhibits remarkable specificity of 84.1%, crucial for identifying customers least likely to default. Furthermore, in the multi - class r esponse - r isk model, the Random Forest classifier achieve s the highest accuracy (83.2%), indicating its efficacy in discerning both potential responders to credit card mail campaign and low - risk credit card users . In this study, we optimized various performance metrics to solve a specific credit risk and mail responsiveness business problem.


Modelling the Doughnut of social and planetary boundaries with frugal machine learning

Vrizzi, Stefano, O'Neill, Daniel W.

arXiv.org Artificial Intelligence

The 'Doughnut' of social and planetary boundaries has emerged as a popular framework for assessing environmental and social sustainability. Here, we provide a proof-of-concept analysis that shows how machine learning (ML) methods can be applied to a simple macroeconomic model of the Doughnut. First, we show how ML methods can be used to find policy parameters that are consistent with 'living within the Doughnut'. Second, we show how a reinforcement learning agent can identify the optimal trajectory towards desired policies in the parameter space. The approaches we test, which include a Random Forest Classifier and $Q$-learning, are frugal ML methods that are able to find policy parameter combinations that achieve both environmental and social sustainability. The next step is the application of these methods to a more complex ecological macroeconomic model.


Enhancing Breast Cancer Prediction with LLM-Inferred Confounders

Roy, Debmita

arXiv.org Artificial Intelligence

Wheeler High School, Marietta, GA Abstract This study enhances breast cancer prediction by using large language models to infer the likelihood of confounding diseases, namely diabetes, obesity, and cardiovascular disease, from routine clinical data. These AI-generated features improved Random Forest model performance, particularly for LLMs like Gemma (3.9%) and Llama (6.4%). The approach shows promise for noninvasive prescreening and clinical integration, supporting improved early detection and shared decision-making in breast cancer diagnosis. Introduction Breast cancer (BC) is a leading cause of death among women in the U.S., with most cases having unknown causes despite known risk factors1. Researchers have identified correlations between BC and various clinical features and biomarkers, such as body mass index, glucose, insulin, leptin, adiponectin, resistin, MCP-1, and HOMA, that can be measured through routine blood tests.




Supplementary Materials Rashomon Capacity: A Metric for Predictive Multiplicity in Classification

Neural Information Processing Systems

(since we pick the log base to be 2). We now prove the converse statements. Individual fairness aims to ensure that "similar individuals are treated similarly." Predictive multiplicity allows different predictions from competing classifiers for the samples. Notably, neural networks with very narrows or wide layers have better reproducibility in their decision regions. The fact that multiple classifiers may yield distinct predictions to a target a sample while having statistically identical average loss performance can also cause security issues in machine learning.


Disjoint Generative Models

Lautrup, Anton Danholt, Rajabinasab, Muhammad, Hyrup, Tobias, Zimek, Arthur, Schneider-Kamp, Peter

arXiv.org Artificial Intelligence

We propose a new framework for generating cross-sectional synthetic datasets via disjoint generative models. In this paradigm, a dataset is partitioned into disjoint subsets that are supplied to separate instances of generative models. The results are then combined post hoc by a joining operation that works in the absence of common variables/identifiers. The success of the framework is demonstrated through several case studies and examples on tabular data that helps illuminate some of the design choices that one may make. The principal benefit of disjoint generative models is significantly increased privacy at only a low utility cost. Additional findings include increased effectiveness and feasibility for certain model types and the possibility for mixed-model synthesis.